AITopics | bridge network

Collaborating Authors

bridge network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MCDDPM: Multichannel Conditional Denoising Diffusion Model for Unsupervised Anomaly Detection in Brain MRI

Trivedi, Vivek Kumar, Sharma, Bheeshm, Balamurugan, P.

arXiv.org Artificial IntelligenceSep-29-2024

Detecting anomalies in brain MRI scans using supervised deep learning methods presents challenges due to anatomical diversity and labor-intensive requirement of pixel-level annotations. Generative models like Denoising Diffusion Probabilistic Model (DDPM) and their variants like pDDPM, mDDPM, cDDPM have recently emerged to be powerful alternatives to perform unsupervised anomaly detection in brain MRI scans. These methods leverage frame-level labels of healthy brains to generate healthy tissues in brain MRI scans. During inference, when an anomalous (or unhealthy) scan image is presented as an input, these models generate a healthy scan image corresponding to the input anomalous scan, and the difference map between the generated healthy scan image and the original anomalous scan image provide the necessary pixel level identification of abnormal tissues. The generated healthy images from the DDPM, pDDPM and mDDPM models however suffer from fidelity issues and contain artifacts that do not have medical significance. While cDDPM achieves slightly better fidelity and artifact suppression, it requires huge memory footprint and is computationally expensive than the other DDPM based models. In this work, we propose an improved version of DDPM called Multichannel Conditional Denoising Diffusion Probabilistic Model (MCDDPM) for unsupervised anomaly detection in brain MRI scans. Our proposed model achieves high fidelity by making use of additional information from the healthy images during the training process, enriching the representation power of DDPM models, with a computational cost and memory requirements on par with DDPM, pDDPM and mDDPM models. Experimental results on multiple datasets (e.g. BraTS20, BraTS21) demonstrate promising performance of the proposed method. The code is available at https://github.com/vivekkumartri/MCDDPM.

anomaly detection, dataset, mcddpm, (13 more...)

arXiv.org Artificial Intelligence

2409.19623

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Factor-Conditioned Speaking-Style Captioning

Ando, Atsushi, Moriya, Takafumi, Horiguchi, Shota, Masumura, Ryo

arXiv.org Artificial IntelligenceJun-27-2024

This paper presents a novel speaking-style captioning method that generates diverse descriptions while accurately predicting speaking-style information. Conventional learning criteria directly use original captions that contain not only speaking-style factor terms but also syntax words, which disturbs learning speaking-style information. To solve this problem, we introduce factor-conditioned captioning (FCC), which first outputs a phrase representing speaking-style factors (e.g., gender, pitch, etc.), and then generates a caption to ensure the model explicitly learns speaking-style factors. We also propose greedy-then-sampling (GtS) decoding, which first predicts speaking-style factors deterministically to guarantee semantic accuracy, and then generates a caption based on factor-conditioned sampling to ensure diversity. Experiments show that FCC outperforms the original caption-based training, and with GtS, it generates more diverse captions while keeping style prediction performance.

caption, speech encoder, style factor, (15 more...)

arXiv.org Artificial Intelligence

2406.1891

Country: Asia > Japan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)

Add feedback

On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization

Armengol-Estapé, Jordi, Michalski, Vincent, Kumar, Ramnath, St-Charles, Pierre-Luc, Precup, Doina, Kahou, Samira Ebrahimi

arXiv.org Artificial IntelligenceMay-30-2024

Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.

few-shot learning, learning, normalization, (16 more...)

arXiv.org Artificial Intelligence

2405.18751

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
Europe > Belgium (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Hono, Yukiya, Mitsuda, Koh, Zhao, Tianyu, Mitsui, Kentaro, Wakatsuki, Toshiaki, Sawada, Kei

arXiv.org Artificial IntelligenceDec-6-2023

Advances in machine learning have made it possible to perform various text and speech processing tasks, including automatic speech recognition (ASR), in an end-to-end (E2E) manner. Since typical E2E approaches require large amounts of training data and resources, leveraging pre-trained foundation models instead of training from scratch is gaining attention. Although there have been attempts to use pre-trained speech and language models in ASR, most of them are limited to using either. This paper explores the potential of integrating a pre-trained speech representation model with a large language model (LLM) for E2E ASR. The proposed model enables E2E ASR by generating text tokens in an autoregressive manner via speech representations as speech prompts, taking advantage of the vast knowledge provided by the LLM. Furthermore, the proposed model can incorporate remarkable developments for LLM utilization, such as inference optimization and parameter-efficient domain adaptation. Experimental results show that the proposed model achieves performance comparable to modern E2E ASR models.

arxiv preprint arxiv, language model, llm, (14 more...)

arXiv.org Artificial Intelligence

2312.03668

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Traversing Between Modes in Function Space for Fast Ensembling

Yun, EungGu, Lee, Hyungi, Nam, Giung, Lee, Juho

arXiv.org Artificial IntelligenceJun-20-2023

Deep ensemble is a simple yet powerful way to improve the performance of deep neural networks. Under this motivation, recent works on mode connectivity have shown that parameters of ensembles are connected by low-loss subspaces, and one can efficiently collect ensemble parameters in those subspaces. While this provides a way to efficiently train ensembles, for inference, multiple forward passes should still be executed using all the ensemble parameters, which often becomes a serious bottleneck for real-world deployment. In this work, we propose a novel framework to reduce such costs. Given a low-loss subspace connecting two modes of a neural network, we build an additional neural network that predicts the output of the original neural network evaluated at a certain point in the low-loss subspace. The additional neural network, which we call a "bridge", is a lightweight network that takes minimal features from the original network and predicts outputs for the low-loss subspace without forward passes through the original network. We empirically demonstrate that we can indeed train such bridge networks and significantly reduce inference costs with the help of bridge networks.

artificial intelligence, bridge network, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.11304

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions

DeChant, Chad, Akinola, Iretiayo, Bauer, Daniel

arXiv.org Artificial IntelligenceJun-16-2023

When robots perform long action sequences, users will want to easily and reliably find out what they have done. We therefore demonstrate the task of learning to summarize and answer questions about a robot agent's past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt. To enable training of question answering, we develop a method to automatically generate English-language questions and answers about objects, actions, and the temporal order in which actions occurred during episodes of robot action in the virtual environment. Training one model to both summarize and answer questions enables zero-shot transfer of representations of objects learned through question answering to improved action summarization. % involving objects not seen in training to summarize.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.09922

Country: North America > United States > New York (0.04)

Genre:

Research Report (0.64)
Workflow (0.55)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Bridge Networks

Olin-Ammentorp, Wilkie, Bazhenov, Maxim

arXiv.org Artificial IntelligenceJun-15-2021

Despite rapid progress, current deep learning methods face a number of critical challenges. These include high energy consumption, catastrophic forgetting, dependance on global losses, and an inability to reason symbolically. By combining concepts from information bottleneck theory and vector-symbolic architectures, we propose and implement a novel information processing architecture, the 'Bridge network.' We show this architecture provides unique advantages which can address the problem of global losses and catastrophic forgetting. Furthermore, we argue that it provides a further basis for increasing energy efficiency of execution and the ability to reason symbolically.

architecture, bridge network, information, (16 more...)

arXiv.org Artificial Intelligence

2106.08446

Country:

North America > United States > Texas > El Paso County > El Paso (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Motion-Attentive Transition for Zero-Shot Video Object Segmentation

Zhou, Tianfei, Wang, Shunzhou, Zhou, Yi, Yao, Yazhou, Li, Jianwu, Shao, Ling

arXiv.org Artificial IntelligenceJul-9-2020

In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e. DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts.

large language model, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIP.2020.3013162

2003.04253

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > UAE (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Add feedback

Brigitte, a Bridge-Based Grid Path-Finder

Grastien, Alban (Data61)

AAAI ConferencesJul-11-2019

We present BRIGITTE, a new path-finding algorithm for 8-connected grids. It is based on the notion bridge that we define here, i.e., a high-level description of paths between all pairs of points from two convex regions that allows fast distance query and fast generation of the prefix and suffix of these paths. BRIGITTE uses a pre-processing step to first partition the map into convex regions and then compute a sufficient set of bridges between every pair of regions. Path-finding is then performed by looking up the regions of the source and target cells and then iterating over the bridges of the pair of regions to determine which one yields the shortest path. BRIGITTE competes favourably compared to CH-SG-R and Copp, although this currently comes at a price of an extensive pre-processing.

bridge, grid, rigitte, (15 more...)

AAAI Conferences

Twelfth Annual Symposium on Combinatorial Search

Country: Oceania > Australia (0.05)

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback